Multilingual electronic transfer dictionary containing topical codes and method of use

ABSTRACT

A multilingual electronic transfer dictionary provides for automatic topic disambiguation by including one or more topic codes in definitions contained the dictionary. Automatic topic disambiguation is accomplished by determining the frequencies of topic codes within a block of text. Dictionary entries having more frequently occurring topic codes are preferentially selected over those having less frequently occurring topic codes. When the topic codes are members of a hierarchical topical coding system, such as the International Patent Classification system, an iterative method can be used with starts with a coarser level of the coding system and is repeated at finer levels until an ambiguity is resolved. The dictionary is advantageously used for machine translation, e.g. between Japanese and English.

COPYRIGHT

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patentdisclosure, as it appears in the PTO patent file or records, butotherwise reserves all copyright rights whatsoever. Copyright © 1999ISTA.

TECHNICAL FIELD

The present invention relates to multilingual electronic dictionariesthat may be used for machine translation.

DEFINITIONS

“Multilingual” means pertaining to two or more languages.

Unless the context otherwise requires, the terms “subject”, “topic” and“field” are virtually synonymous in this disclosure, as are the terms“dictionary”, “glossary” and “lexicon.”

BACKGROUND ART

The existence of field-dependent translations of terms has long been aproblem for both ordinary human translation and for machine translation.A term in a source language, for example, Japanese, may have more thanone translation in a target language, for example, English, depending onthe subject, topic or field of the document being translated. Forexample, the word “soshiki” in Japanese would be translated to theEnglish “tissue” in a medical document, to the English “weave” in thecase of textiles, or to the English “microstructure” in the case ofmetallurgy.

Conventional machine translation programs, for example, Systran®,contain topical dictionaries or glossaries. The user must manuallyselect topical dictionaries appropriate for the document beingtranslated. In this case, there is one dictionary per topic, forexample, chemistry or medicine, rather than one topic per dictionaryentry or record as in the current invention.

The machine translation program METAL contains three individuallexicons: a German monolingual lexicon, an English monolingual lexiconand a German-English bilingual lexicon (Katherine Koch, “MachineTranslation and Terminology Database—Uneasy Bedfellows?” Lecture Notesin Artificial Intelligence 898, Machine Translation and the Lexicon,Petra Steffens, ed., Springer, Berlin, 1995, pp. 131-140.) Semanticinformation is disclosed only for the monolingual lexicons, not for thebilingual lexicon. Even for the monolingual lexicon, only 15 semantictypes are disclosed, such as “abstract”, “concrete”, “human”, “animal”and “process.” These are quite different from the topicalclassifications that are the subject of the current invention.

Brigitte Blaser, in “TransLexis: An Integrated Environment for Lexiconand Terminology Management,” Lecture Notes in Artificial Intelligence898, Machine Translation and the Lexicon, Petra Steffens, ed., Springer,Berlin, 1995, pp. 158-173, discloses the incorporation of concepts,including broader concepts, narrower concepts and related concepts in alexicon database management system for machine translation. However,this disclosure does not extend to the incorporation of subject codes,notably hierarchical subject codes, in a multilingual electronicdictionary not does it disclose the use of concepts or other subjectarea information for automatic topic discrimination in machinetranslation. Notably, these concepts are not subject areas; rather, theyconstitute the interlingua for interlingua-based machine translation.

Masterson disclosed a means of automatic sense disambiguation for themachine translation of Latin to English in the article “The thesaurus insyntax and semantics,” Mechanical Translation, Vol. 4, pp. 1-2, 1957. Asdescribed by Wilks, Stator, and Guthrie in Electric Words: Dictionaries,Computers and Meanings, MIT Press, Cambridge, Mass., 1996, pp. 88-89,Masterson disclosed a nonstatistical method using the headings inRoget's Thesaurus.

In this predecessor to interlingua-based machine translation, Mastersondisclosed a concept thesaurus for the words in a Latin passage fromVirgil's Georgics. Each word stem from the Latin passage was associatedwith a set of head numbers from Roget's International Thesaurus bytranslating the word stems into English and selecting the head numbersfor the corresponding English words. For example, the three Latin nounstems, “agricola”, “terram” and “aratro” have the following heads (wherethe head words are shown instead of the head numbers):

-   -   AGRICOLA: Region, Agriculture    -   TERRAM: Region, Land, Furrow    -   ARATRO: Agriculture, Furrow, Convolution

In the case of the text, “Agricola incurvo in terram dimovit aratro”,the heads that occur more than once are selected into a concept set. Inthe above example, this yields the following sets:

-   -   AGRICOLA: Region, Agriculture    -   TERRAM: Region, Furrow    -   ARATRO: Agriculture, Furrow

Finally, the English words listed under each head in Roget's Thesaurusare intersected to leave the appropriate translation candidates. In thecurrent example, this yields the following sets:

-   -   AGRICOLA: farmer, ploughman    -   TERRAM: soil, ground    -   ARATRO: plough, ploughman, rustic        Masterson does not disclose a multilingual dictionary, nor does        she disclose use of topical codes in a multilingual dictionary        for disambiguation.

Kenneth W. Church, William A. Gale and David E. Yarowsky, in U.S. Pat.No. 5,541,836, also disclosed the use of the categories from Roget'sThesaurus in automatically disambiguating word/sense pairs and the useof bilingual bodies of text to train word/sense probability tables.Church et al do not disclose a multilingual dictionary nor do theydisclose the use of topical codes in a multilingual dictionary for sensedisambiguation.

JuneJei Kuo, in U.S. Pat. No. 5,285,386, “Machine Translation ApparatusHaving Means for Translating Polysemous Words Using Dominated Codes”,discloses interlingua-based machine translation using semantic codes inthe role of the interlingua. While Kuo discloses transfer dictionaries,these are not multilingual transfer dictionaries. Rather they aretransfer dictionaries between the semantic codes, the interlingua inthis case, and words in the target language.

The requirement of manually selecting a topical dictionary is a barrierto the automated translation of documents such as patent documents thatcover many topical areas. Also, the semantic methods of theinterlingua-based approaches do not provide for automaticallydetermining the topic of the document being translated. There is a needfor a means for automatically determining the most appropriate targetdefinition depending on the topic of the document. Such a means isreferred to as “automatic topic disambiguation” in the text below.

Elizabeth Liddy, Woojin Palk and Edmund Szi-li Wu, in U.S. Pat. No.5,873,056, “Natural Language Processing System for Semantic VectorRepresentation Which Accounts for Lexical Ambiguity”, disclose amonolingual lexical database that contains nonhierarchical subject codesassigned to each word in the database. To avoid unnecessary reiterationof prior teachings, the disclosure of each reference cited herein ishereby incorporated by reference.

SUMMARY OF THE INVENTION

This invention provides a multilingual electronic dictionary comprisinga memory that contains a data structure composed of a plurality ofrecords, each record comprising representations of the following: afirst term (in a first language), a second term (in a second language),and a topical code. The topical code indicates a topical area in whichthe second term is a translation of the first term.

Such an electronic dictionary allows for selecting topic-appropriatetranslations of terms in a textual object in a first language into asecond language. This is accomplished by:

-   -   (a) providing an electronic dictionary containing records        comprising representations of terms in the first language and        the second language;    -   (b) scanning a textual object in the first language to identify        each occurrence of a term in the textual object in a record of        the electronic dictionary;    -   (c) inserting each topical code associated with each of the        records identified in step (b) into a data structure that        provides for counting of the frequency of occurrence of each        topical code; and    -   (d) whenever there occur a plurality of terms in the second        language corresponding to a term in the first language,        selecting the term associated with the most frequently occurring        topical code.

In one embodiment of the invention, step (c) is performed by generatinga table associating each topical code occurring in the textual objectwith its frequency of occurrence.

In another embodiment, as illustrated in the Examples below, step (c) isperformed by the use of a map class.

In preferred embodiments of the apparatus and methods of this invention,one or more of the terms are represented in Unicode.

In the electronic dictionary of the present invention, each record mayoptionally include a representation of the part of speech for the firstterm, or for each term. However this is not a necessary field in therecord.

Also, each record may optionally include a representation of thelanguage of the first term and of the language of said second term.Alternatively a specific representation of the language (e.g. its nameor a code such as JP indicating Japanese) may be omitted where anindication of the language is inherent in the structure of the record.(e.g. first field is always Japanese; second is always English).

The dictionary of the present invention is not limited to bilingualrecords, but may be generated with records that accommodate a thirdlanguage, or any number of languages.

In preferred embodiments of the invention, the topical coding system isa hierarchical one, e.g. the International Patent Classification system.

Such embodiments are desirably used for selecting topic-appropriatetranslations of terms in a textual object in a first language into asecond language by doing the following:

-   -   (a) providing an electronic dictionary containing records        comprising representations of terms in the first language and        the second language along with a topical code from a        hierarchical system;    -   (b) scanning a textual object in the first language to identify        each occurrence of a term in the textual object in a record of        said electronic dictionary;    -   (c) inserting each topical code associated with each of the        records identified in step (b) into a plurality of data        structures that provide for counting of the frequency of        occurrence of each topical code at a code at a plurality of        levels of the hierarchy; and    -   (d) whenever there occur a plurality of terms in the second        language corresponding to a term in the first language,        selecting the term associated with the most frequently occurring        topical code;        wherein steps (c) and (d) are applied iteratively, first at a        coarser level of the topical code hierarchy, then at        successively more detailed levels of the topical code hierarchy        until topical ambiguities are either completely resolved or        resolved to the extent allowed by the most detailed level of the        hierarchy.

DISCLOSURE OF INVENTION

The multilingual electronic dictionary of this invention provides forautomatic topic disambiguation by including one or more topic codes indefinitions contained the dictionary.

A dictionary record according to this invention is part of a datastructure contained in amachine-accessible memory. This record containsat least one topic code comprising the following items (with an exampleof a record for the Japanese term “soshiki” as shown in Table 1):

TABLE 1 Term in Language 1 (Japanese) “soshiki” Part of Speech Noun Termin Language 2 (English) tissue Part of Speech Noun Topic Code A61K 47/38Although the Japanese term is shown in the Tables herein in Englishcharacters within quotation marks, in the records of the presentinvention, a term in a language that uses other than English charactersis preferably represented in a customary coding system such as Unicode.Alternatively all terms, optionally including the Topic Code, may forconsistency be represented in such a coding system.

The electronic dictionary of the present invention is embodied as a datastructure in any form of machine-accessible memory, which may bepermanent or transient. For example, the data structure may be storedusing means known in the art for digital or other discrete encoding thatis readable to produce a physical signal responding to the contents ofselected memory locations, as by electromagnetic or optical means.Various magnetic memories are well known, including fixed disc drives,removable diskettes, tape, and cards. Integrated circuit memory modulesmay also be used in the present invention, including those in aself-contained form such as PCMCIA cards. A dictionary of the presentinvention may desirably be stored in permanent form, as on CD-ROM orlike media.

In accordance with the present invention, the dictionary may be accessedby a general purpose computer running an operating system (e.g. Windows,Mac OS, Unix, Linux, Pick, etc.) suitable to access the memory on whichthe dictionary is resident (either permanently or transiently) andincluding suitable application programming. For purpose ofexemplification, programming in the C++ language is disclosed herein,but the reader will appreciate that any of a wide variety of programminglanguages or database applications may alternatively be employed,including, for example, Pascal, Fortran, COBOL, Eiffel, Java; Access,dBase, FoxPro, Paradox, and the like.

A dictionary of the present invention may be incorporated in astandalone, handheld unit, as an enhanced version of translators such asthose currently available from Selectronics, Sony EB Electronic Book,and Franklin Computer Corporation. Alternatively, a dictionary andassociated programming in accordance with the present invention may bestored on a single general purpose computer or distributed on a CD-ROM;or it may be made available as a service via a network of computers,e.g. an intranet, a wide-area network, or a global communicationsnetwork such as the Internet.

Although the records illustrated in this disclosure include fields forPart of Speech, the reader should understand that an electronicdictionary of the present invention does not require information as to aterm's Part of Speech, and so the present invention may optionally beimplemented without any such fields.

A dictionary record according to this invention may contain more thantwo languages. For example, it may also contain the term in German andFrench as shown in Table 2 below.

TABLE 2 Term in Language 1 (Japanese) “soshiki” Part of Speech Noun Termin Language 2 (English) tissue Part of Speech Noun Term in Language 3(German) Gewebe Part of Speech Noun Term in Language 4 (French) tissuPart of Speech Noun Topic Code A61K 47/38

Alternatively, a record representing a term in more than two languagesmay be structured to include a field for a representation of thelanguage for which each translation is provided. For example:“shoshiki”/Japanese/tissue/English/Gwebe/German/tissu/French/A61K 47/38

A dictionary record according to this invention may contain more thanone topic code, as own in the record in Table 3 below.

TABLE 3 Term in Language 1 (Japanese) “mak-u” Part of Speech Noun Termin Language 2 (English) membrane Part of Speech Noun Topic Code 1 A61K47/38 Topic Code 2 B01D 61/36

There are several topical code systems in existence that can be used inthis invention. This can be a nonhierarchical system such as thatdisclosed in Appendix A of U.S. Pat. No. 5,873,056. However,hierarchical topical code systems are preferred.

There are several known hierarchical topical code systems that can beused. Examples include the International Patent Classification (IPC)codes, the United States Patent Classifications, the categories ofRoget's International Thesaurus®, the Dewey Decimal System, and theLibrary of Congress Card Catalog Classification system. Other subjectcodes that may be used include the Longman subject codes disclosed inthe Longman Dictionary of Contemporary English published by LongmanGroup UK Limited, Longman House, Burnt Mill, Harlow, Essex CM22JE,England.

For example, the IPC codes contain five levels of classification asillustrated in Table 4 below.

TABLE 4 LEVEL Example FAMILY A CLASS A61 SUBCLASS A61K MAIN GROUP A61K47 SUBGROUP A61K 47/38

These levels in hierarchical topical code systems present the advantageof granularity. As is disclosed in the Examples below, if an ambiguitycannot be resolved at a shallow level, it may be resolvable at a deepercode level.

The topic codes for a given set of terms can be selected by locating adocument that has previously been classified by topic and for which thesource term and its translations are appropriate. For example, intranslating a Japanese patent document into English, the main IPC codefor that document can be used for Japanese-English term pairs that areencountered during translation.

According to this invention, these topic codes are used to determine afavored subject area within which to translate a particular term.Briefly, to determine the favored subject area, the topic codes for partor all of the terms in a particular block of text are counted. The blockcan be a sentence, a paragraph, a table, a set of text occurring withina certain number of bytes or words, a subdocument, an entire document orany other definable set of text in the source document.

There are several counting methods known to the art. For example, atwo-column table comprising topical codes and their correspondingfrequencies may be used. The preferred method is a “map” as used in theprogramming language C++. Descriptions of “map” may be found in MarkNelson's “C++ Programmer's Guide to the Standard Template Library,” IDGBooks Worldwide, Inc, Foster City, 1995, or in Microsoft Corporation's“Microsoft® Visual Studio™ 6.0 Development System”, 1998 (hereinafter,“MVS”). To quote the latter:

“The template class describes an object that controls a varying-lengthsequence of elements of type pair<const Key, T>. The first element ofeach pair is the sort key and the second is its associated value. Thesequence is represented in a way that permits lookup, insertion, andremoval of an arbitrary element with a number of operations proportionalto the logarithm of the number of elements in the sequence (logarithmictime). Moreover, inserting an element invalidates no iterators, andremoving an element invalidates only those iterators that point at theremoved element.”

In the current invention, the topical code or a substring generated fromthe topical code is assigned to the Key and inserted into the map. Thesecond member of the pair<const Key, T> may be arbitrary as the map onlyneeds to be used for counting the frequencies of the codes.

If a document has been assigned one or more topical codes, for example,IPC codes in the case of patent documents, these assigned codes canoptionally be added to the counts of topical codes for the block of textbeing analyzed. It should be stressed that the use of suchdocument-level assigned codes is optional and that the method of thecurrent invention can be applied to any text.

While there are several coding systems that may be used in thisinvention, the preferred coding system is the Unicode® wide characterset as described in Microsoft Corporation's “Microsoft® Visual Studio™6.0 Development System”, 1998 as follows:

-   -   “Unicode: The Wide Character Set A wide character is a 2-byte        multilingual character code. Any character in use in modern        computing worldwide, including technical symbols and special        publishing characters, can be represented according to the        Unicode specification as a wide character. Developed and        maintained by a large consortium that includes Microsoft, the        Unicode standard is now widely accepted. Because every wide        character is always represented in a fixed size of 16 bits,        using wide characters simplifies programming with international        character sets.”

A complete listing of the Unicode® codes preferred for this invention,and especially preferred for encoding Japanese, Chinese and Korean termsfor this invention, can be found in The Unicode Consortium, “The UnicodeStandard: Worldwide Character Encoding, Version 1.0, Vols. 1 and 2”,Addison-Wesley, Reading, Mass., 1992.

EXAMPLES EXAMPLE I Bilingual Electronic Dictionary ContainingHierarchical Topical Codes And Methods For Its Use

Example Ia. Bilingual Electronic Dictionary Containing HierarchicalTopical Codes

1) Structure of Dictionary Record in Storage

With the character coding system being Unicode®, codes belonging to theuser domain of the Unicode® system are selected as record and fielddelimiters as follows:

<record> 0xe000 // start of record </record> 0xe001 // end of record<lang1> 0xe011 // start of first language term field </lang1> 0xe021 //end of first language term field <lang2> 0xe013 // start of secondlanguage term field </lang2> 0xe023 // end of second language term field<topic> 0xe0a0 // start of topical code field </topic> 0xe0a1 // end oftopical code field

In the list above, 0xe011 is a hexadecimal representation of thetwo-byte (16-bit) code consisting of the hexadecimal bytes e0 and 11.

A dictionary entry then has the following sequence of 16-bit codes:

-   -   <record><lang1>Japanese term</lang1><lang2>English        term</lang2><topic>IPC code</topic></record>

The records in the dictionaries are selected and constructed by manuallyexamining patents in many IPC classifications. For a particular patentin this Example, the language of the patent Japanese is taken as thefirst language and English is taken as the second language. For a givenJapanese term in the patent, the Japanese term is entered into the firstlanguage term field and the English translation of that term which ismost appropriate for the topic of the patent is entered into the secondlanguage term field. The main IPC classification for the patent isentered into the topical code field.

The dictionary is entered into storage using Windows dialog methods asdisclosed in David J.

Kruglinski, “Inside Visual C++,” 4^(th) Ed., Microsoft Press, 1997 andknown to persons knowledgeable in the field.

A simple parse routine known to persons knowledgeable in the field isused to read the dictionary record into active memory as describedbelow.

2) Dictionary File

The dictionary file consists of a sequence in storage of the abovedictionary records. It also contains an array of offsets of thedictionary records so that the records can be addressed by the recordlocation number contained in the node elements of the index file.

3) Index File

The index file for the aforesaid dictionary file is a multinodal treemade up of nodes and node elements as follows. There is one tree perlanguage in the dictionary. If the dictionary is only to be used in onedirection, for example, from the first language to the second language,only one tree for the source language need be present. Within each node,the node elements are arranged in Unicode® code order according to thekey. Ordering of keys is determined by the Unicode® character order. Thefirst 10 characters of a term are contained in the node element forspeed. If the term is less than 10 characters in length, the key isterminated by a NULL. For terms 10 or more characters in length, thedictionary record is parsed into memory so that the full term can becompared. (This is a simple variant of the B-tree method disclosed inDonald E. Knuth, “The Art of Computer Programming, Volume 3, Sorting andSearching”, Addison-Wesley, Reading, Mass., 1973, pp. 473-476,hereinafter “Knuth”) The choice of 127 elements makes the size of a node4096 bytes which is the size of a memory page of the Windows® NToperating system as implemented on Intel x86 microprocessors. In theindex file, the nodes are aligned on page boundaries so that a singlefile access reads a complete node into physical memory. Personsknowledgeable in the field can modify the parameters stated herein tothe operating system and file structures as deemed appropriate.

class Node { DWORD m_ThisNode: // number of this node DWORD m_NumberOfElements: // number of node elements currently in node DWORDm_MaxNumberOfElements: DWORD m_ParentNode: DWORD m_ParentElement: //number of parent element in parent node DWORD m_FirstElement: DWORDm_bBottomNode: // true if this is a bottom node DWORD m_LastElement: //last element in linked list NodeElement m_Elements[127]: };

b) Node Element Class:

class NodeElement {  wchar_t key[10] // hold up to 10 characters fromrecord  DWORD childNode: // for keys less than this key  DWORD next: //next element in node  DWORD record: // number of record in dictionary }:

4) Structure of Dictionary in Active Memory

When associated with a textual object, the aforementioned dictionaryrecord is parsed into an active object, for example a C++ object havingthe following member variables where “wstring” is a wide character(Unicode®) string according to MVS:

class Record { wstring wsL1Term; // term from first language wstringwsL2Term; // term from second language wstring wsTopic; // topical code}Example Ib Use of Bilingual Dictionary in a Single Level Method ofSelecting Translations of Terms Using Hierarchical Topical Codes

1) Association of Dictionary Entries with a Textual Object in a FirstLanguage

Given a first language wstring wsLang1String of length lenL1S, at eachposition m in the wstring, the index file is searched for first languageterms that match wsLang1String starting at position m and copies ofthese dictionary records are then read into active memory as follows:

a) A CPtrArray paMaster (as described in MVS), is constructed withlength lenL1S and initialized with NULL at each position in the array.In operation, this will become an array of arrays, with a CPtrArraypaRecords of dictionary records at each position in this array.Example of paMaster before adding dictionary records, Table 5:

TABLE 5 paMaster[0] NULL (intervening paMasters) (all NULL) paMaster[m]NULL paMaster[m + 1] NULL (intervening paMasters) (all NULL)paMaster[lenL1S-1] NULLb) Staring from position 0 in wsLang1String, the index file is searchedfollowing the method described in Knuth. The records in the dictionaryfor which the first language term matches the substring of wsLang1Stringof the same length as the first language term are read into activememory as follows:

The record is parsed by sequentially reading each character of therecord. Upon reading a <record> code, the program allocates memory foran instance of the class Entry. After reading the <lang1> code, theprogram places subsequent characters in wsL1Term until the </lang1> codeis read. Likewise, after reading the <lang2> code, the program placessubsequent characters in wsL2Term until the </lang2> code is read.Continuing, after reading the <topic> code, the program placessubsequent characters in wsTopic until the </topic> code is read. Uponreading the </record> code, the program appends the pointer to thisrecord to the CPtrArray at position 0. If this is the first record atthat position, i.e., there is a NULL at that position, the programallocates a CPtrArray for the record and assigns its pointer topaMaster[0], then appends the pointer of the new dictionary record tothis newly allocated CPtrArray.

c) The procedure in b) is repeated for each position in wsLang1String.When these iterations are complete each position m of paMaster willcontain either a NULL (if no matching dictionary records were found) ora pointer to a CPtrArray of dictionary records.Example of paMaster after adding dictionary records, Table 6:

TABLE 6 paMaster[0] *paRecords (0^(th)) (intervening paMasters) (mayinclude NULL) paMaster[m] *paRecords (m^(th)) paMaster[m + 1] NULL(intervening paMasters) (may include NULL) paMaster[lenL1S-1] *paRecords( (lenL1S-1)^(th)) (may be NULL)In this example, NULL entries remain where no matching dictionaryentries were found.

An example of a paRecord after adding dictionary records is shown below.In this case, three matching records were found and copied into activememory, Table 7.

TABLE 7 paRecord[0] Pointer to first record paRecord[1] Pointer tosecond rccord PaRecord[2] Pointer to third record

2) Inserting Topical Codes Into a Structure that Allows for Countingtheir Frequencies of Occurrence

For each member of the pointer array, a key consisting of the firstthree characters of the IPC code contained in wsTopic (i.e., the classcode) are added to the map defined by:

-   -   map<wstring, int> STR2INT;    -   to get the map    -   STR2INT smapClassCode;        Following the teachings of MVS for the map template class, the        following C++ code segment is executed for each member of        paMaster as follows:

for(int i = 0; i < lenL1S; i++) {   if(paMaster[i] != NULL)   {     intnRecords = paMaster[i].GetLength); // number of dictionary        records copied     CPtrArray*pRecords = paMaster[I];     for(intj = 0: j < nRecords; j ++)     {       Record*pRecord = pRecords[j];      pair(wstring, int)newPair, // pair for insertion into      smapClassCode       newPairfirst = pRecord- > wsTopic.left(3);      newPair.second = 1;       smapClassCodeinsert(newPair);     }   }}

2a) Alternate Structure and Method for Counting Topical Codes:

Alternatively to using a map structure, a list structure (as disclosedin MVS) can be used by appending each instance of the topical code orthe substring derived from the topical code to a list. After all of thetopical codes have been appended, this list can be optionally sorted. Avector is then constructed using the following C++ structures:

typedef struct { wstring wsCode; int nCount )STRINTPAIR;vector(STRINTPAIR) STRINTARRAY; STRINTARRAY staCodeFrequencyTable;For each unique topical code in the list, the number of occurrences ofsaid member in the list is counted, the topical code assigned to wsCodcand the count assigned to nCount for a new STRINTPAIR and thisSTRINTPAIR appended to the STRINTARRAY with the STRINTARRAY::push_backfunction.

For this implementation, the most frequently occurring topical code isdetermined by looking up and comparing nCount for the respective topicalcodes in STRINTARRAY. The alternative structure described hereinabovecan be used as an alternative in any of the following examples as well.

3) Selecting Translations of Terms in Second Language Using Frequenciesof Topical Codes

For each of the positions m of wsLang1String, if there is more than onematching first language terms from the dictionary, the number ofoccurrences of the first three characters of the IPC code insmapClassCode for the matching first language terms are compared. Theterm with the most frequent IPC key is selected.

For the particular example of paRecord shown above, the following codesegment selects the record with the most frequently occurring IPC code:

int nMostFrequent= 0: // initialize to first record in paRecord intnMaxCount = 0; // for following the maximum count for(int i = 0; i <paRecord.GetLength( ); i++) {   Record*pRecord = pRecord[i];   wstringwsTest = pRecord- > wsTopic.left(3);   int nCount =smapClassCode.count(wsTest);   if(nCount > nMaxCount)   {    nMostFrequent = i;     nMaxCount = nCount   } }Note that in this example, if more than one IPC class code has the samefrequency the first of the two records to occur in pRecords is selected.Example Ic. Use of Bilingual Dictionary in a Multilevel Method ofSelecting Translations of Terms Using Hierarchical Topical Codes

-   -   1) Association of Dictionary Entries with a Textual Object in a        First Language

This step is the same as for Example Ib above.

2) Inserting Topical Codes into a Structure that Allows for Countingtheir Frequencies of Occurrence

Instead of the single smapClassCode in Example Ib, four maps areconstructed, namely,

-   -   STR2INT smapClassCode; // from IPC code through class level    -   STR2INT smapSubclassCode; // from IPC code through subclass        level    -   STR2INT smapG roup // from IPC code through main group level    -   STR2INT smapSubgroup; //PC code through subgroup level        In this case, the following C++ code segment is executed for        each member of paMaster as follows:

for(int i = 0; i < lenL1S; i++) {   if (paMaster[i] != NULL)   {     intnRecords = paMaster[i].GetLength( ); // number of     dictionary recordscopied     CPtrArray*pRecords = paMaster[I];     for(int j = 0; j <nRecords: j++)     {       Record*pRecord = pRecords[j];       pair <wstring, int > newPair, // pair for insertion into       smapClassCode      newPair.first = pRecord- > wsTopic.left(3);       newPair.second =1;       smapClassCode.insert(newPair);       newPair.first = pRecord- >wsTopic.left(4); // IPC       code to subclass level      newPair.second = 1;       smapSubclassCode.insert(newPair);      int nGroupDelimiterPosition= wsTopic.find(L‘/’);if(nGroupDelimterPosition== 0)         nGroupDelimiterPosition=wsTopic.find(L‘;’);       newPair.first = pRecord- >       wsTopic.left(nGroupDelimiterPosition);        // IPC code to main grouplevel       newPair.second = 1;       smapGroup.insert(newPair);      newPair.first = pRecord- > wsTopic; // entire IPC       codenewPair.second = 1; smapSubgroupCodeinsert(newPair);     }   } }

3) Selecting Translations of Terms in Target Language Using Frequenciesof Topical Codes

For each of the positions m of wsLang1String, if there is more than onematching first language terms from the dictionary, the number ofoccurrences of the first three characters of the IPC code insmapClassCode for the matching first language terms are compared. Theterm with the most frequent IPC key is selected. If there are still twoor more terms with the same numbers of IPC class code occurrences, thenumbers of occurrences of the subclass codes in smapSubclassCode arecompared. This comparison is repeated until either a single term isselected or a repetition at the subgroup level is completed.

For the particular example of paRecord shown above, the following C++code segment selects the record with the most frequently occurring IPCcode:

int nSelectedRecord = 0; // initialize to first record in paRecord intnMostFrequentClass = 0; // initialize to first record in paRecord intnMaxCountClass = 0; // for following the maximum count intnClassRedundancy = 0; int nMostFrequentSubclass = 0; // initialize tofirst record in paRecord int nMaxCountSubclass = 0; // for following themaximum count int nSubclassRedundancy = 0; int nMostFrequentGroup = 0;// initialize to first record in paRecord int nMaxCountGroup = 0; // forfollowing the maximim count int nGroupRedundancy = 0; intnMostFrequentSubgroup = 0; // initialize to first record in paRecord intnMaxCotmtSubgroup = 0; // for following the maximum count intnSubGroupRedundancy = 0; for(int i = 0; i < paRecord.GetLength( ); i ++){ Record* pRecord = pRecords[i]; wstring wsClass = pRecord- >wsTopic.left(3); int nCountClass = smapClassCode.count(wsClass);if(nCountClass > nMaxCountClass) { nMostFrequentClass = i;nMaxCountClass = nCountClass, nClassRedundancy = 1; } elseif(nCountClass == nMaxCountClass) { nClassRedundancy++; } wstringwsSubClass = pRecord- > wsTopic.left(4) int nCountSubclass =smapSubclassCodecount(wsSubclass); if(nCountSubgroup >nMaxCountSubgroup) { nMostFrequentSubgroup = i; nMaxCountSubclass =nCountSubgroup nSubclassRedundancy = 1; } elseif(nCountSubclass ==nMaxCountSubclass) { nClassRedundancy++; } int nGroupDelimiterPosition =wsTopic.find(L‘/’); if(nGroupDelimiter == 0) nGroupDelimiterPosition =wsTopic.find(L‘:’); wstring wsGroup = pRecord- >wsTopic.left(nGroupDelimiterPosition; int nCount =smapSubclassCodecount(wsGroup); if(nCountGroup > nMaxCountGroup) {nMostFrequentGroup = i: nMaxCountGroup = nCountGroup nGroupRedundancy =1: } elseif(nCountGroup == nMaxCountGroup) { nClassRedundancy++; }wstring wsSubgroup = pRecord- > wsTopic; int nCountSubgroup =smapSubgroupCodecount(wsSubgroup); if(nCountSubgroup > MaxCountSubgroup){ nMostFrequentSubgroup = i; nMaxCountSubgroup = nCountSubgroupnSubclassRedundancy = 1; { nClassRedundancy++; } } if(nClassRedundancy== 1) nSelectedRecord = nMostFrequentClass elseif(nSubclassRedundancy== 1) nSelectedRecord = nMostFrequentSubclass elseif(nGroupRedundancy== 1) nSelectedRecord = nMostFrequentGroup elseif(nSubgroupRedundancy== 1) nSelectedRecord = nMostFrequentSubgroup

In this example if there is redundancy at all levels, the first recordis selected.

Example Id. Use of Bilingual Dictionary in a Single Level Method ofSelecting Translations of Terms Using Hierarchical Topical Codes,Additionally Using Topical Codes Assigned to the Document as a Whole.

This method applies to a patent or patent application for which IPCclassifications have been assigned and whose codes are included in thebibliographic portion of the document.

1) Association of Dictionary Entries with a Textual Object in a FirstLanguage

This step is the same as for Example Ib above.

2) Inserting Topical Codes Into a Structure that Allows for Countingtheir Frequencies of Occurrence

This step is the same as for II above with the exception that theSubclass codes of the bibliographic IPC codes are also added tosmapClassCode. This is done by assigning each IPC code for the documentto a wstring wsIPCCode and adding these to smapClassCode with thefollowing C++ code segment:

for(each IPC code in bibliographic header) // pseudocode {  pair(wstring, int)newPair,// pair for insertion into smapClassCode  newPair.first = pRecord- > wsIPCCode.left(3);   newPair.second = 1;  smapClassCode.insert(newPair); } for(int i 0; i < lenL1S; i++) {   if(pa Master[i] != NULL)   {     int nRecords = paMaster+i+.GetLength( );    // number of dictionary records copied     CPtrArray*pRecords =paMaster[I];     for(int j = 0; j < nRecords; j++)     {      Record*pRecord = pRecords[j];       pair(wstring int)newPair,      // pair for insertion into smapClassCode       newPair.first =pRecord- > wsTopic.left(3):       newPair.second = 1:      smapClassCode.insert(newPair);     }   } }

3) Selecting Translations of Terms in Second Language Using Frequenciesof Topical Codes

This step is the same as for Example Ib above.

EXAMPLE II Multilingual Electronic Dictionary Containing HierarchicalTopical Codes and Method for Its Use

Example IIA. Multilingual Electronic Dictionary Containing HierarchicalTopical Codes

1) Structure of Dictionary Record in Storage

With the character coding system being Unicode®, codes belonging to theuser domain of the Unicode® system are selected as record and fielddelimiters as follows:

<record> 0xe000 //start of record </record> 0xe001 //end of record<lang1> 0xe010 //start of first language term field </lang1> 0xe011/tend of first language term field <lang2> 0xe020 //start of secondlanguage term field </lang2> 0xe021 //end of second language term field<lang3> 0xe030 //start of third language term field </lang3> 0xe031//end of third language term field <lang4> 0xe040 //start of fourthlanguage term field </lang4> 0xe041 //end of fourth language term field<topic> 0xe0a0 //start of topical code field </topic> 0xe0a1 //end oftopical code fieldIn the list above, 0xE011 is a hexadecimal representation of thetwo-byte (16-bit) code consisting of the hexadecimal bytes E0 and 11.

A dictionary entry then has the following sequence of 16-bit codes:

-   -   <record><lang1> Japanese term</lang1><lang2> English term        </lang2><lang3> German term </lang3><lang4> French        term</lang4><topic> IPC code</topic></record>

The records in the dictionaries are selected and constructed by manuallyexamining patents in many IPC classifications. For a particular patentin this Example, the language of the patent Japanese is taken as thefirst language, English as the second language, German as the thirdlanguage and French as the fourth language. For a given Japanese term inthe patent, the Japanese term is entered into the first language termfield and the English translation of that term which is most appropriatefor the topic of the patent is entered into the second language termfield. Likewise, the appropriate German and French terms are enteredinto their respective fields. The main IPC classification for the patentis entered into the topical code field.

A simple parse routine known to persons knowledgeable in the field isused to read the dictionary record into active memory.

2) Dictionary File

The dictionary file consists of a sequence in storage of the abovedictionary records. It also contains an array of offsets of thedictionary records so that the records can be addressed by the recordlocation number contained in the node elements of the index file.

3) Index File

The index file for the aforesaid dictionary file is a multinodal treemade up of nodes and node elements as follows. There is one tree perlanguage in the dictionary. If the dictionary is only to be used in onedirection, for example, from the first language to the second, third andfourth languages, only one tree for the first language need be present.Within each node, the node elements are arranged in Unicode® code orderaccording to the key. Ordering of keys is determined by the Unicode®character order. The first 10 characters of a term are contained in thenode element for speed. If the term is less than 10 characters inlength, the key is terminated by a NULL. For terms 10 or more charactersin length, the dictionary record is parsed into memory so that the fullterm can be compared. (This is a simple variant of the B-tree methoddisclosed in Donald E. Knuth, “The Art of Computer Programming, Volume3, Sorting and Searching”, Addison-Wesley, Reading, Mass., 1973,pp.473-476.) The choice of 127 elements makes the size of a node 4096bytes which is the size of a memory page of the Windows® NT operatingsystem as implemented on Intel x86microprocessors. In the index file,the nodes are aligned on page boundaries so that a single file accessreads a complete node into physical memory.

a) Node Class

class Node { DWORD m_ThisNode: //number of this node DWORDm_NumberOfElements; //number of node elements currently in node DWORDm_MaxNumberOfElements; DWORD m_ParentNode; DWORD m_ParentElement://number of parent element in parent node DWORD m_FirstElement; DWORDm_bBottomNode; //true if this is a bottom node DWORD m_LastElement;//last element in linked list NodeElement m_Elements[127]; };

b) Node Element Class:

class NodeElement { wchar_t key[10]; //hold up to 10 characters fromrecord DWORD childNode; //for keys less than this key DWORD next; //nextelement in node DWORD record: //number of record in dictionary };

4) Structure of Dictionary in Active Memory

When associated with a textual object, the aforementioned dictionaryrecord is parsed into an active object, for example a C++ object havingthe following member variables where “wstring” is a wide character(Unicode®) string according to MVS:

class Record } wstring wsL1Term; //term from first language wstringwsL2Term; //term from second language wstring wsL3Term; //term fromthird language wstring wsL4Term; //term from fourth language wstringwsTopic; //topical code }Example IIb. Use of Multilingual Dictionary in a Single Level Method ofSelecting Translations of Terms Using Hierarchical Topical Codes

1) Association of Dictionary Entries with a Textual Object in a FirstLanguage

Given a first language wstring wsLang1String of length lenL1S, at eachposition m in the wstring, the index file is searched for first languageterms that match wsLang1String starting at position m and copies ofthese dictionary records are then read into active memory as follows:

a) A CPtrArray paMaster (as described in MVS), is constructed withlength lenL1S and initialized with NULL at each position in the array.In operation, this will become an array of arrays, with a CPtrArraypaRecords of dictionary records at each position in this array.Example of paMaster before adding dictionary records, Table 8:

TABLE 8 paMaster[0] NULL (intervening paMasters) (all NULL) paMaster[m]NULL paMaster[m + 1] NULL (intervening paMasters) (all NULL)paMaster[lenL1S-1] NULLIn this example, NULL entries remain where no matching dictionaryentries were found.

An example of a paRecord after adding dictionary records is shown below.In this case, three matching records were found and copied into activememory, Table 9.

TABLE 9 paRecord[0] Pointer to first record paRecord[1] Pointer tosecond record paRecord[2] Pointer to third recordb) Starting from position 0 in wsLang1String, the index file is searchedfollowing the method described in Knuth. The records in the dictionaryfor which the first language term matches the substring of wsLang1Stringof the same length as the first language term are read into activememory as follows:

The record is parsed by sequentially reading each character of therecord. Upon reading a <record>code, the program allocates memory for aninstance of the class Entry. After reading the <lang1>code, the programplaces subsequent characters in wsL1Term until the </lang1>code is read.Likewise, after reading the <lang2>code, the program places subsequentcharacters in wsL2Term until the </lang2> code is read. After readingthe <lang3> code, the program places subsequent characters in wsL3Termuntil the </lang3> code is read. After reading the <lang4> code, theprogram places subsequent characters in wsL4Term until the </lang4> codeis read. Continuing, after reading the <topic> code, the program placessubsequent characters in wsTopic until the </topic> code is read. Uponreading the </record> code, the program appends the pointer to thisrecord to the CPtrArray at position 0. If this is the first record atthat position, i.e., there is a NULL at that position, the programallocates a CPtrArray for the record and assigns its pointer topaMaster[0].

c) The procedure in b) is repeated for each position in wsLang1String.When these iterations are complete each position m of paMaster willcontain either a NULL (if not matching dictionary records were found) ora pointer to a CPtrArray of dictionary records.Example of paMaster after adding dictionary records, Table 10:

TABLE 10 paMaster[0] *paRecords (0^(th)) (intervening paMasters) (mayinclude NULL) paMaster[m] *paRecords (m^(th)) paMaster[m + 1] NULL(intervening paMasters) (may include NULL) paMaster[lenL3S-1] *paRecords( (lenL1S-1)^(th)) (may be NULL)

2) Inserting Topical Codes Into a Structure that Allows for Countingtheir Frequencies of Occurrence

For each member of the pointer array, a key consisting of the firstthree characters of the IPC code contained in wsTopic (i.e., the classcode) are added to the map defined by:

-   -   map<wstring, int> STR2INT;    -   to get the map    -   STR2INT smapClassCode;

Following the teachings of MVS, the following C++ code segment isexecuted for each member of paMaster as follows:

for(int i = 0; i < lenL1S; i++) {   if( paMaster[i] ! = NULL )   {    int nRecords = paMaster[i].GetLength( ); //number of dictionary    records copied     CPtrArray*pRecords = paMaster [I];     for( int j= 0; j < nRecords: j++)     {       Record*pRecord = pRecords[j];      pair < wstring, int > newPair; //pair for insertion into      smapClassCode       newPair.first - pRecord- > wsTopic.left(3);      newPair.second = 1;       smapClassCode.insert( newPair );     }  } }

3) Selecting Translations of Terms in Second, Third and Fourth LanguagesUsing Frequencies of Topical Codes

For each of the positions m of wsLanglString, if there is more than onematching first language terms from the dictionary, the number ofoccurrences of the first three characters of the IPC code insmapClassCode for the matching first language terms are compared. Theterm with the most frequent IPC key is selected.

For the particular example of paRecord shown above, the following C++code segment selects the record with the most frequently occurring IPCcode:

int nMostFrequent = 0; //initialize to first record in paRecord intnMaxCount = 0; //for following the maximum count for( int i = 0; i <paRecord.GetLength( ); i++) {   Records pRecord =pRecords[i];   wstringwsTest = pRecord- > wsTopic.left(3);   int nCount = smapClassCode.count(wslbst );   if( nCount > nMaxCount )   {     nMostFrequent = i;    nMaxCount = nCount;   } }Note that in this example, if more than one IPC class code has the samefrequency the first of the two records to occur in pRecords is selected.

EXAMPLE III Bilingual Electronic Dictionary Containing MultipleHierarchical Topical Codes and Methods for Its Use

Example IIIa. Bilingual Electronic Dictionary Containing MultipleHierarchical Topical Codes

1) Structure of Dictionary Record in Storage

With the character coding system being Unicode®, codes belonging to theuser domain of the Unicode® system are selected as record and fielddelimiters as follows:

<record> 0xe000 //start of record </record> 0xe001 //end of record<lang1> 0xe011 //start of first language term field </lang1> 0xe021//end of first language term field <lang2> 0xe013 //start of secondlanguage term field </ang2> 0xe023 //end of second language term field<topic> 0xe0a0 //start of topical code field </topic> 0xe0a1 //end oftopical code field

In the list above, 0xE011 is a hexadecimal representation of thetwo-byte (16-bit) code consisting of the hexadecimal bytes E0 and 11.

A dictionary entry then has the following sequence of 16-bit codes:

-   -   <record><lang1>Japanese term </lang1><lang2>English term        </lang2><topic>IPC code1</topic>(repeat n−1 times)<topic>IPC        code n</topic></record>        The records in the dictionaries are selected and constructed by        manually examining patents in many IPC classifications. For a        particular patent in this Example, the language of the patent        Japanese is taken as the first language and English is taken as        the second language. For a given Japanese term in the patent,        the Japanese term is entered into the first language term field        and the English translation of that term which is most        appropriate for the topic of the patent is entered into the        second language term field. The inventive IPC classifications        for the patent are entered into the topical code fields.

A simple parse routine known to persons knowledgeable in the field isused to read the dictionary record into active memory as described inbelow.

2) Dictionary File

The dictionary file consists of a sequence in storage of the abovedictionary records. It also contains an array of offsets of thedictionary records so that the records can be addressed by the recordlocation number contained in the node elements of the index file.

3) Index File

The index file for the aforesaid dictionary file is a multinodal treemade up of nodes and node elements as follows. There is one tree perlanguage in the dictionary. If the dictionary is only to be used in onedirection, for example, from the first language to the second language,only one tree for the source language need be present. Within each node,the node elements are arranged in Unicode® code order according to thekey. Ordering of keys is determined by the Unicode® character order. Thefirst 10 characters of a term are contained in the node element forspeed. If the term is less than 10 characters in length, the key isterminated by a NULL. For terms 10 or more characters in length, thedictionary record is parsed into memory so that the full term can becompared. (This is a simple variant of the B-tree method disclosed inDonald E. Knuth, “The Art of Computer Programming, Volume 3, Sorting andSearching”, Addison-Wesley, Reading, Mass., 1973, pp. 473-476,hereinafter “Knuth”.) The choice of 127 elements makes the size of anode 4096 bytes which is the size of a memory page of the Windows® NToperating system as implemented on Intel x86 microprocessors. In theindex file, the nodes are aligned on page boundaries so that a singlefile access reads a complete node into physical memory.

a) Node Class:

class Node { DWORD m_ThisNode: //number of this node DWORDm_NumberOfElements: //number of node elements currently in node DWORDm_MaxNumberOfElements: DWORD m_PatentNode: DWORD m _PatentElement://number of parent element in parent node DWORD m_FirstElement: DWORDm_bBottomNode; //true if this is a bottom node DWORD m_LastElement://last element in linked list NodeElement m_Elentents[127]; };

b) Node Element Class:

class NodeElement { wchar_t key[10]; //hold up to 10 characters fromrecord DWORD childNode; //for keys less than this key DWORD next; //nextelement in node DWORD record; //number of record in dictionary };

4) Structure of Dictionary in Active Memory

When associated with a textual object, the aforementioned dictionaryrecord is parsed into an active object, for example a C++ object havingthe following member variables where “wstring” is a wide character(Unicode®) string according to MVS:

class Record { wstringwsL1Term; //term from first languagewstringwsL2Term; //term from second language vector<wstring> vecTopics;//array of topical code wstrings }Example IIIb. Use of bilingual dictionary in a single level method ofselecting translations of terms using multiple hierarchical topicalcodes

1) Association of Dictionary Entries with a Textual Object in a FirstLanguage

Given a first language wstring wsLang1String of length lenL1S, at eachposition m in the wstring, the index file is searched for first languageterms that match wsLang1String starting at position m and copies ofthese dictionary records are then read into active memory as follows:

a) A CPtrArray paMaster (as described in MVS), is constructed withlength lenL1S and initialized with NULL at each position in the array.In operation, this will become an array of arrays, with a CPtrArraypaRecords of dictionary records at each position in this array.Example of paMaster before adding dictionary records, Table 11:

TABLE 11 paMaster[0] NULL (intervening paMasters) (all NULL) paMaster[m]NULL paMaster[m + 1] NULL (intervening paMasters) (all NULL)paMaster[lenL1S-1] NULL

b) Starting from position 0 in wsLang1String, the index file is searchedfollowing the method described in Knuth.

The records in the dictionary for which the first language term matchesthe substring of wsLang1String of the same length as the first languageterm are read into active memory as follows:

The record is parsed by sequentially reading each character of therecord. Upon reading a <record>code, the program allocates memory for aninstance of the class Entry. After reading the <lang1> code, the programplaces subsequent characters in wsL1Term until the </lang1> code is toread. Likewise, after reading the <lang2> code, the program placessubsequent characters in wsL2Term until the </lang2> code is read.Continuing, after reading the <topic> code, the program placessubsequent characters in a wstring wsTopic until the </topic> code isread. This wstring is appended to the vector vecTopics. If there areadditional topic codes, these are likewise read into a wstring wsTopicwhich is then appended to vecTopics. Upon reading the </record> code,the program appends the pointer to this record to the CPtrArray atposition 0. If this is the first record at that position, i.e., there isa NULL at that position, the program allocates a CPtrArray for therecord and assigns its pointer to paMaster[0].

c) The procedure in b) is repeated for each position in wsLang1String.When these iterations are complete each position m of paMaster willcontain either a NULL (if not matching dictionary records were found) ora pointer to a CPtrArray of dictionary records.Example of paMaster after adding dictionary records, Table 12:

TABLE 12 paMaster[0] *paRecords (0^(th)) (intervening paMasters) (mayinclude NULL) paMaster[m] *paRecords (m^(th)) paMaster[m + 1] NULL(intervening paMasters) (may include NULL) paMaster[lenLlS-l] NULLIn this example, NULL entries remain where no matching dictionaryentries were found.

An example of a paRecord after adding dictionary records is shown below.In this case, three matching records were found and copied into activememory, Table 13.

TABLE 13 paRecord[0] Pointer to first record paRecord[1] Pointer tosecond record paRecord[2] Pointer to third record

2) Inserting Topical Codes Into a Structure that Allows for Countingtheir Frequencies of Occurrence

For each member of the pointer array, a key consisting of the firstthree characters of the IPC code contained in wsTopic (i.e., the classcode) are added to the map defined by:

-   -   map<wstring, int> STR2INT;    -   to get the map    -   STR2INT smapClassCode;

Following the teachings of MVS, the following C++ code segment isexecuted for each member of paMaster as follows:

for( int i = 0; i < lenL1S; i++) {   if( paMaster[i] ! = NULL)   {    int nRecords = paMaster[i].GetLength( ); //number of dictionary    records copied     CPtrArray* pRecords = paMaster[I];     for( int j= 0; j <nRecords; j++)     {       Record* pRecord = pRecords[j];      vector < wstring > & vecTopics = pRecord- > vecTopics;      for(int k = 0; k < vecTopics.size( ) ; k++)       {        wstring& wsTopic = vecTopics[k];         pair < wstring, int >newPair; //pair for insertion into         smapClassCode        newPair.first = pRecord- > wsTopic.left(3);        newPair.second = 1;         smapClassCode.insert( newPair );      }     }   } }

3) Selecting Translations of Terms in Second Language Using Frequenciesof Topical Codes

For each of the positions m of wsLang1String, if there is more than onematching first language terms from the dictionary, the number ofoccurrences of the first three characters of the IPC code insmapClassCode for the matching first language terms are compared. Theterm with the most frequent IPC key is selected.

For the particular example of paRecord shown above, the following C++code segment selects the record with the most frequently occurring IPCcode:

int nMostFrequent = 0; //initialize to forst record in paRecord intnMaxCount = 0; //for following the maximum count for( int i = 0; i <paRecord.GetLength( ); i++) {   Record* pRecord = pRecords[i];   wstringwsTest = pRecord- > wsTopic.left(3);   int nCount = smapClassCode.count(wsTest );   if( nCount > nMaxCount)   {     nMostFrequent = i;    nMaxCount = nCount;   } }Note that in this example, if more than one IPC class code has the samefrequency, the first of the two records to occur in pRecords isselected.

EXAMPLE IV Bilingual Electronic Dictionary Containing NonhierarchicalTopical Codes and Methods for Its Use

Example IVa. Bilingual Electronic Dictionary Containing NonhierarchicalTopical Codes

1) Structure of Dictionary Record in Storage

With the character coding system being Unicode®, codes belonging to theuser domain of the Unicode® system are selected as record and fielddelimiters as follows:

<record> 0xe000 //start of record </record> 0xe001 //end of record<lang1> 0xe011 //start of first language term field </lang1> 0xe021//end of first language term field <lang2> 0xe013 //start of secondlanguage term field </lang2> 0xc023 //end of second language term field<topic> 0xe0a0 //start of topical code field </topic> 0xe0a1 //end oftopical code fieldIn the list above, 0xE011 is a hexadecimal representation of thetwo-byte (16-bit) code consisting of the hexadecimal bytes E0 and 11.A dictionary entry then has the following sequence of 16-bit codes:

-   -   <record><lang1>Japanese term </lang1><lang2> English term        </lang2><topic>topical code</topic></record>

The records in the dictionaries are selected and constructed by manuallyexamining patents in many topical areas. Each patent is manuallyassigned to one of the topic codes listed in Appendix A of U.S. Pat. No.5,873,056. For a particular patent in this embodiment, the language ofthe patent Japanese is taken as the first language and English is takenas the second language. For a given Japanese term in the patent, theJapanese term is entered into the first language term field and theEnglish translation of that term which is most appropriate for the topicof the patent is entered into the second language term field. Themanually assigned topic code for the patent is entered into the topicalcode field.

A simple parse routine known to persons knowledgeable in the field isused to read the dictionary record into active memory as describedbelow.

2) Dictionary File

The dictionary file consists of a sequence in storage of the abovedictionary records. It also contains an array of offsets of thedictionary records so that the records can be addressed by the recordlocation number contained in the node elements of the index file.

3) Index File

The index file for the aforesaid dictionary file is a multinodal treemade up of nodes and node elements as follows. There is one tree perlanguage in the dictionary. If the dictionary is only to be used in onedirection, for example, from the first language to the second language,only one tree for the source language need be present. Within each node,the node elements are arranged in Unicode® code order according to thekey. Ordering of keys is determined by the Unicode® character order. Thefirst 10 characters of a term are contained in the node element forspeed. If the term is less than 10 characters in length, the key isterminated by a NULL. For terms 10 or more characters in length, thedictionary record is parsed into memory so that the full term can becompared. (This is a simple variant of the B-tree method disclosed inKnuth.) The choice of 127 elements makes the size of a node 4096 byteswhich is the size of a memory page of the Windows® NT operating systemas implemented on Intel x86 microprocessors. In the index file, thenodes are aligned on page boundaries so that a single file access readsa complete node into physical memory.

a) Node Class

class Node {   DWORD m_ThisNode; //number of this node   DWORDm_NumberOfElements; //number of node elements   currently in node  DWORD m_MacNumberOfElements:   DWORD m_ParentNode;   DWORDm_ParentElement; //number of parent element in parent   node   DWORDm_FirstElement;   DWORD m_bBottomNode; //true if this is a bottom node  DWORD m_LastElement; //last element in linked list  NodeElement  m_Elements[127]; }:

b) Node Element Class:

class NodeElement {   wchar_t key[10]; //hold up to 10 characters fromrecord   DWORD childNode;  //for keys less than this key   DWORDnext;  //next element in node   DWORD record;  //number of record indictionary };

4) Structure of Dictionary in Active Memory

When associated with a textual object, the aforementioned dictionaryrecord is parsed into an active object, for example a C++ object havingthe following member variables where “wstring” is a wide character(Unicode®) string according to MVS:

class Record { wstring wsL1Term; //term from first language wstringwsL2Term; //term from second language wstring wsTopic, //topical code }

Example IVb. Use of Bilingual Dictionary in a Single Level Method ofSelecting Translations of Terms Using Nonhierarchical Topical Codes

1) Association of Dictionary Entries with a Textual Object in a FirstLanguage.

Given a first language wstring wsLang1String of length lenL1S, at eachposition m in the wstring, the index file is searched for first languageterms that match wsLang1String starting at position m and copies ofthese dictionary records are then read into active memory as follows:

a) A CPtrArray paMaster (as described in MVS), is constructed withlength lenL1S and initialized with NULL at each position in the array.In operation, this will become an array of arrays, with a CPtrArraypaRecords of dictionary records at each position in this array.Example of paMaster before adding dictionary records, Table 14:

TABLE 14 paMaster[0] NULL (intervening paMasters) (all NULL) paMaster[m]NULL paMaster[m + 1] NULL (intervening paMasters) (all NULL)rnMaster[lenL1S-1] NULLb) Starting from position 0 in wsLang1String, the index file is searchedfollowing the method described in Knuth.

The records in the dictionary for which the first language term matchesthe substring of wsLang1String of the same length as the first languageterm are react into active memory as follows:

The record is parsed by sequentially reading each character of therecord. Upon reading a <record> code, the program allocates memory foran instance of the class Entry. After reading the <lang1> code, theprogram places subsequent characters in wsL1Term until the </lang1> codeis read. Likewise, after reading the <lang2> code, the program placessubsequent characters in wsL2Term until the </lang2> code is read.Continuing, after reading the <topic> code, the program placessubsequent characters in wsTopic until the </topic> code is read. Uponreading the </record> code, the program appends the pointer to thisrecord to the CPtrArray at position 0. If this is the first record atthat position, i.e., there is a NULL at that position, the programallocates a CPtrArray for the record and assigns its pointer topaMaster[0].c) The procedure in b) is repeated for each position in wsLang1String.When these iterations are complete each position m of paMaster willcontain either a NULL (if not matching dictionary records were found) ora pointer to a CPtrArray of dictionary records.Example of paMaster after adding dictionary records, Table 15:

TABLE 15 paMaster[0] *paRecords (0^(th)) (intervening paMasters) (mayinclude NULL) paMaster[m] *paRecords (m^(th)) paMaster[m + 1] NULL(intervening paMasters) (may include NULL) paMaster[lenL1S-l] NULLIn this example, NULL entries remain where no matching dictionaryentries were found.

An example of a paRecord after adding dictionary records is shown inTable 16 below. In this case, three matching records were found andcopied into active memory.

TABLE 16 paRecord[0] Pointer to first record paRecord[1] Pointer tosecond record paRecord[2] Pointcr to third rccord

2) Inserting Topical Codes into a Structure that Allows for Countingtheir Frequencies of Occurrence

For each member of the pointer array, a key consisting of the entiretopical code contained in wsTopic is added to the map defined by:

-   -   map<wstring, int> STR2INT;    -   to get the map    -   STR2INT smapCode;        Following the teachings of MVS, the following C++ code segment        is executed for each member of paMaster as follows:

for( int i = 0; i < lenL1S; i++) {   if( paMaster[i] ! = NULL )   {    int nRecords = paMaster[i].GetLength( ); //number of dictionary    records copied     CPtrArray* pRecords = paMaster[I];     for( int j= 0; j < nRecords; j++)     {       Record* pRecord = pRecords[j];      pair < wstring, int > newPair; //pair for insertion into      smapCode       newPair.first = pRecord- > wsTopic; //entire codebecause       nonhierarchical       newPair.second = 1;      smapCode.insert( newPair );     }   }

3) Selecting Translations of Terms in Second Language Using Frequenciesof Topical Codes

For each of the positions m of wsLang1String, if there is more than onematching first language terms from the dictionary, the number ofoccurrences of the first three characters of the IPC code insmapClassCode for the matching first language terms are compared. Theterm with the most frequent IPC key is selected.

For the particular example of paRecord shown above, the following codesegment selects the record with the most frequently occurring IPC code:

int nMostFrequent = 0; //initialize to first record in paRecord intnMaxCount = 0; //for following the maximum count for( int i = 0; i <paRecord.GetLength( ); i++) {   Record* pRecord = pRecords[i];   wstringwsTest = pRecord- > wsTopic; //entire code because   nonhierarchical  int nCount = smapCode.count( wsTest );   if( nCount > nMaxCount )   {    nMostFrequent = i;     nMaxCount = nCount;   } }Note that in this example, if more than one topical code has the samefrequency the first of the two records to occur in pRecords is selected.

I claim:
 1. An electronic dictionary comprising; a memory that containsa data structure composed of a plurality of records, each recordcomprising representations of the following: a first term, a secondterm, and a topical code, wherein said first term is in a firstlanguage, said second term is in a second language and said topical codeindicates a topical area in which the second term is a translation ofthe first term, a record in the data structure having been generated by(a) providing a collection of textual objects in the first or secondlanguage, each textual object having been associated with one or morecodes that identify a topic to which the textual object pertains, and(b) entering into the record representations of a term from the textualobject, a topical code associated with the textual object, and thecorresponding term from a translation of the textual object into theother language.
 2. A method of using an electronic dictionary of claim 1for, said electronic dictionary comprising: a memory that contains adata structure composed of a plurality of records, each recordcomprising representations of the following: a first term, a secondterm, and a topical code, wherein said first term is in a firstlanguage, said second term is in a second language and said topical codeindicates a topical area in which the second term is a translation ofthe first term, a record in the data structure having been generated by(a) providing a collection of textual objects in the first or secondlanguage, each textual object having been associated with one or morecodes that identify a topic to which the textual object pertains, and(b) entering into the record representations of a term from the textualobject, a topical code associated with the textual object, and thecorresponding term from a translation of the textual object into theother language, wherein said electronic dictionary is used toautomatically translate text by selecting topic-appropriate translationsof terms in a textual object in a first language into a second languagecomprising the steps of from terms in a second language, the methodcomprising: (a) providing an said electronic dictionary of claim 1containing records comprising representations of terms in said firstlanguage and said second language; (b) scanning a textual object in saidfirst language to identify each occurrence of a term in said textualobject in a record of said electronic dictionary; (c) inserting eachtopical code associated with each of said records identified in step (b)into a data structure that provides for counting of the frequency ofoccurrence of each topical code; and (d) whenever there occur aplurality of terms in the second language corresponding to a term in thefirst language, selecting the term associated with the most frequentlyoccurring topical code.
 3. The method of claim 2, wherein step (c) isperformed by generating a table associating each topical code occurringin the textual object with its frequency of occurrence.
 4. The method ofclaim 2, wherein step (c) is performed by the use of a map class.
 5. Theelectronic dictionary of claim 1, each record further comprising arepresentation of the part of speech for said first term.
 6. Theelectronic dictionary of claim 1, each record further comprisingrepresentations of: the language of said first term and the language ofsaid second term.
 7. The electronic dictionary of claim 1, wherein atleast some of said records further comprise a representation of a thirdterm, said third term being a translation of the first term into a thirdlanguage in the topical area of said topical code.
 8. The electronicdictionary of claim 1, wherein one of said languages is Japanese andanother language is English.
 9. The electronic dictionary of claim 1,wherein at least one of said terms is represented in Unicode.
 10. Amethod of using an electronic dictionary of claim 9 for, said electronicdictionary comprising: a memory that contains a data structure composedof a plurality of records, each record comprising representations of thefollowing: a first term, a second term, and a topical code, wherein saidfirst term is in a first language, said second term is in a secondlanguage and said topical code indicates a topical area in which thesecond term is a translation of the first term, a record in the datastructure having been generated by (a) providing a collection of textualobjects in the first or second language, each textual object having beenassociated with one or more codes that identify a topic to which thetextual object pertains, and (b) entering into the recordrepresentations of a term from the textual object, a topical codeassociated with the textual object, and the corresponding term from atranslation of the textual object into the other language, wherein atleast one of said terms is represented in Unicode and wherein saidelectronic dictionary is used to translate text by selectingtopic-appropriate translations of terms in a textual object in a firstlanguage into a second language comprising the steps of: (a) providingan said electronic dictionary of claim 9 containing records comprisingrepresentations of terms in said first language and said secondlanguage; (b) scanning a textual object in said first language toidentify each occurrence of a term in said textual object in a record ofsaid electronic dictionary; (c) inserting each topical code associatedwith each of said records identified in step (b) into a data structurethat provides for counting of the frequency of occurrence of eachtopical code; and (d) whenever there occur a plurality of terms in thesecond language corresponding to a term in the first language, selectingthe term associated with the most frequently occurring topical code. 11.The electronic dictionary of claim 1, wherein said topical code is amember of a hierarchical topical coding system.
 12. The electronicdictionary of claim 11, wherein said hierarchical topical coding systemis the International Patent Classification system.
 13. A method of usingan electronic dictionary of claim 11, said electronic dictionarycomprising: a memory that contains a data structure composed of aplurality of records, each record comprising representations of thefollowing: a first term, a second term, and a topical code, wherein saidfirst term is in a first language, said second term is in a secondlanguage and said topical code indicates a topical area in which thesecond term is a translation of the first term, a record in the datastructure having been generated by (a) providing a collection of textualobjects in the first or second language, each textual object having beenassociated with one or more codes that identify a topic to which thetextual object pertains, and (b) entering into the recordrepresentations of a term from the textual object, a topical codeassociated with the textual object, and the corresponding term from atranslation of the textual object into the other language, wherein saidtopical code is a member of a hierarchical topical coding system andwherein said electronic dictionary is used for selectingtopic-appropriate translations of terms in a textual object in a firstlanguage into a second language comprising the steps of: (a) providingan said electronic dictionary of claim 11 containing records comprisingrepresentations of terms in said first language and said secondlanguage; (b) scanning a textual object in said first language toidentify each occurrence of a term in said textual object in a record ofsaid electronic dictionary; (c) inserting each topical code associatedwith each of said records identified in step (b) into a plurality ofdata structures that provide for counting of the frequency of occurrenceof each hierarchical topical code at a plurality of levels of saidhierarchy; and (d) whenever there occur a plurality of terms in thesecond language corresponding to a term in the first language, countingthe frequency of the topical codes associated with each of saidplurality of terms, and selecting the term associated with the mostfrequently occurring topical code; wherein step (d) is appliediteratively, first at a coarser level of the topical code hierarchy,then at successively more detailed levels of the topical code hierarchyuntil topical ambiguities are either completely resolved or resolved tothe extent allowed by the most detailed level of the hierarchy.
 14. Themethod of claim 13, wherein step (c) is performed by generating aplurality of tables associating at each of a plurality of selectedlevels of said hierarchy each topical code occurring in the textualobject with its frequency of occurrence.
 15. The method of claim 13,wherein step (c) is performed by the use of a plurality of map classes.16. The electronic dictionary of claim 11, wherein at least one of saidterms is represented in Unicode.
 17. A method of using an electronicdictionary of claim 16 for selecting topic-appropriate translations ofterms in a textual object in a first language into a second languagecomprising the steps of: (a) providing an electronic dictionary of claim16 containing records comprising representations of terms in said firstlanguage and said second language; (b) scanning a textual object in saidfirst language to identify each occurrence of a term in said textualobject in a record of said electronic dictionary; (c) inserting eachtopical code associated with each of said records identified in step (b)into a plurality of data structures that provide for counting of thefrequency of occurrence of each hierarchical topical code at a pluralityof levels of said hierarchy; and (d) whenever there occur a plurality ofterms in the second language corresponding to a term in the firstlanguage, counting the frequency of the topical codes associated witheach of said plurality of terms, and selecting the term associated withthe most frequently occurring topical code; wherein step (d) is appliediteratively, first at a coarser level of the topical code hierarchy,then at successively more detailed levels of the topical code hierarchyuntil topical ambiguities are either completely resolved or resolved tothe extent allowed by the most detailed level of the hierarchy.
 18. Theelectronic dictionary of claim 11, wherein said hierarchical topicalcoding system is a technology classification system.
 19. An electronictranslator comprising: aan electronic dictionary comprising a memorythat contains a data structure composed of a plurality of records, eachrecord comprising representations of a first term, a second termprovided in a second language, and a topical code for records havingplural meanings within a scope defined by the topical code, said topicalcode indicating a topical area of text for translation of the secondterm from the first term, and means for providing translation of atextual object composed of representations of terms in said firstlanguage into representations of terms in said second language, saidmeans for providing translation comprising: (a) means for scanning atextual object in said first language to identify each occurrence of aterm in said textual object in a record of said data structure; (b)means for inserting each topical code associated with each of saidrecords identified in step (a) into a data structure that provides forcounting of the frequency of occurrence of each topical code; and (c)means for selecting the term associated with the most frequentlyoccurring topical code whenever there occur a plurality of terms in thesecond language corresponding to a term in the first language.
 20. Amethod of generating a record in a translation dictionary composed of aplurality of records, each record representing a first term, a secondterm, and a topical code, wherein said first term is in a firstlanguage, said second term is in a second language and said topical codeindicates a topical area in which the second term is a translation ofthe first term by: (a) providing a collection of textual objects in thefirst or second language, each textual object having been associatedwith one or more codes that identify a topic to which the textual objectpertains, and (b) entering into the record representations of a termfrom the textual object, a topical code associated with the textualobject, and the corresponding term from a translation of the textualobject into the other language.
 21. The method of claim 2, wherein saidtopical code is selected from a classification system comprising atleast four members.
 22. The method of claim 13, each said successivelymore detailed level comprising members of a plurality of coarser levelhierarchical topics of said courser level.
 23. The electronic translatorof claim 19, wherein said topical code is, or is derived from, theenumerative notation of an enumerative classification system.
 24. Theelectronic translator of claim 23, wherein the format of saidhierarchical topical code enables hierarchical iteration by means ofselecting a substring of each hierarchical topical code.
 25. Theelectronic translator of claim 24, wherein said hierarchical topicalcode and said at least one lower level topical code are members of theInternational Patent Classification system.
 26. The electronictranslator of claim 19, wherein said topical code is a member of ahierarchical topical coding system.
 27. The electronic translator ofclaim 19, wherein said topical code is selected from a classificationsystem comprising at least four members.
 28. The electronic translatorof claim 19, said means for selecting the term associated with the mostfrequently occurring topical code further comprising a means forcounting the frequency of the topical codes associated with each of saidplurality of terms, and selecting the term associated with the mostfrequently occurring topical code.
 29. The electronic translator ofclaim 28, wherein said means for counting is applied iteratively, firstat a coarser level of the topical code hierarchy, then at successivelymore detailed levels of the topical code hierarchy until topicalambiguities are either completely resolved or resolved to the extentallowed by the most detailed level of the hierarchy.
 30. The electronictranslator of claim 29, wherein each said successively more detailedlevel comprising members of a plurality of coarser level hierarchicaltopics of said courser level.